Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
G3 (Bethesda) ; 14(3)2024 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-38180089

RESUMEN

Many genetic models (including models for epistatic effects as well as genetic-by-environment) involve covariance structures that are Hadamard products of lower rank matrices. Implementing these models requires factorizing large Hadamard product matrices. The available algorithms for factorization do not scale well for big data, making the use of some of these models not feasible with large sample sizes. Here, based on properties of Hadamard products and (related) Kronecker products, we propose an algorithm that produces an approximate decomposition that is orders of magnitude faster than the standard eigenvalue decomposition. In this article, we describe the algorithm, show how it can be used to factorize large Hadamard product matrices, present benchmarks, and illustrate the use of the method by presenting an analysis of data from the northern testing locations of the G × E project from the Genomes to Fields Initiative (n ∼ 60,000). We implemented the proposed algorithm in the open-source "tensorEVD" R package.


Asunto(s)
Algoritmos , Modelos Genéticos , Genoma , Tamaño de la Muestra
2.
Nat Commun ; 14(1): 6904, 2023 10 30.
Artículo en Inglés | MEDLINE | ID: mdl-37903778

RESUMEN

Genotype-by-environment (G×E) interactions can significantly affect crop performance and stability. Investigating G×E requires extensive data sets with diverse cultivars tested over multiple locations and years. The Genomes-to-Fields (G2F) Initiative has tested maize hybrids in more than 130 year-locations in North America since 2014. Here, we curate and expand this data set by generating environmental covariates (using a crop model) for each of the trials. The resulting data set includes DNA genotypes and environmental data linked to more than 70,000 phenotypic records of grain yield and flowering traits for more than 4000 hybrids. We show how this valuable data set can serve as a benchmark in agricultural modeling and prediction, paving the way for countless G×E investigations in maize. We use multivariate analyses to characterize the data set's genetic and environmental structure, study the association of key environmental factors with traits, and provide benchmarks using genomic prediction models.


Asunto(s)
Interacción Gen-Ambiente , Zea mays , Zea mays/genética , Genotipo , Fenotipo , Genómica/métodos
3.
BMC Res Notes ; 16(1): 148, 2023 Jul 17.
Artículo en Inglés | MEDLINE | ID: mdl-37461058

RESUMEN

OBJECTIVES: The Genomes to Fields (G2F) 2022 Maize Genotype by Environment (GxE) Prediction Competition aimed to develop models for predicting grain yield for the 2022 Maize GxE project field trials, leveraging the datasets previously generated by this project and other publicly available data. DATA DESCRIPTION: This resource used data from the Maize GxE project within the G2F Initiative [1]. The dataset included phenotypic and genotypic data of the hybrids evaluated in 45 locations from 2014 to 2022. Also, soil, weather, environmental covariates data and metadata information for all environments (combination of year and location). Competitors also had access to ReadMe files which described all the files provided. The Maize GxE is a collaborative project and all the data generated becomes publicly available [2]. The dataset used in the 2022 Prediction Competition was curated and lightly filtered for quality and to ensure naming uniformity across years.


Asunto(s)
Genoma de Planta , Zea mays , Fenotipo , Zea mays/genética , Genotipo , Genoma de Planta/genética , Grano Comestible/genética
4.
Plant Genome ; 15(4): e20254, 2022 12.
Artículo en Inglés | MEDLINE | ID: mdl-36043341

RESUMEN

The success of genomic selection (GS) in breeding schemes relies on its ability to provide accurate predictions of unobserved lines at early stages. Multigeneration data provides opportunities to increase the training data size and thus, the likelihood of extracting useful information from ancestors to improve prediction accuracy. The genomic best linear unbiased predictions (GBLUPs) are performed by borrowing information through kinship relationships between individuals. Multigeneration data usually becomes heterogeneous with complex family relationship patterns that are increasingly entangled with each generation. Under these conditions, historical data may not be optimal for model training as the accuracy could be compromised. The sparse selection index (SSI) is a method for training set (TRN) optimization, in which training individuals provide predictions to some but not all predicted subjects. We added an additional trimming process to the original SSI (trimmed SSI) to remove less important training individuals for prediction. Using a large multigeneration (8 yr) wheat (Triticum aestivum L.) grain yield dataset (n = 68,836), we found increases in accuracy as more years are included in the TRN, with improvements of ∼0.05 in the GBLUP accuracy when using 5 yr of historical data relative to when using only 1 yr. The SSI method showed a small gain over the GBLUP accuracy but with an important reduction on the TRN size. These reduced TRNs were formed with a similar number of subjects from each training generation. Our results suggest that the SSI provides a more stable ranking of genotypes than the GBLUP as the TRN becomes larger.


Asunto(s)
Fitomejoramiento , Triticum , Triticum/genética , Fitomejoramiento/métodos , Fenotipo , Genómica/métodos , Genoma
5.
Mol Breed ; 42(12): 71, 2022 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-37313322

RESUMEN

Oil palm is the most important oil crop worldwide. Colombia is the fourth largest producer, primarily relying on production from interspecific hybrids, derived from crosses between Elaeis oleifera and Elaeis guineensis (OxG). However, conventional breeding can take up to 20 years to generate a new variety. Therefore, reducing the breeding cycle while improving the genetic gain for complex traits is desirable. Genomic selection (GS) is an approach with the potential to achieve this goal. In this study, we evaluated 431 F1 interspecific hybrids (OxG) and 444 backcrosses (BC1) for morphological and yield-related traits. Genomic predictions were performed with the G-BLUP model using three different population datasets for training the model: the same population (TRN1), the other population (TRN2), and both populations (TRN1+2). Higher multi-family prediction accuracies were obtained for foliar area (0.3 in OxG) and trunk height (0.47 in BC1) when the model was trained with TRN1. Single-family prediction accuracies were lower in the OxG compared to BC1 families for traits such as trunk diameter, trunk height, bunch number, and yield using TRN1. Conversely, lower prediction accuracies were obtained for most traits when the model was trained using TRN2 (< 0.1). Multi-trait models showed a substantial increase of the predictions for traits such as yield (0.22 for OxG and 0.44 for BC1), because of the genetic correlations between traits. The results herein highlighted the potential of GS for parental selection in OxG and BC1 populations, but further studies are required to improve the models to select individuals by their genetic value. Supplementary Information: The online version contains supplementary material available at 10.1007/s11032-022-01341-5.

6.
Heredity (Edinb) ; 127(5): 423-432, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34564692

RESUMEN

Genomic prediction models are often calibrated using multi-generation data. Over time, as data accumulates, training data sets become increasingly heterogeneous. Differences in allele frequency and linkage disequilibrium patterns between the training and prediction genotypes may limit prediction accuracy. This leads to the question of whether all available data or a subset of it should be used to calibrate genomic prediction models. Previous research on training set optimization has focused on identifying a subset of the available data that is optimal for a given prediction set. However, this approach does not contemplate the possibility that different training sets may be optimal for different prediction genotypes. To address this problem, we recently introduced a sparse selection index (SSI) that identifies an optimal training set for each individual in a prediction set. Using additive genomic relationships, the SSI can provide increased accuracy relative to genomic-BLUP (GBLUP). Non-parametric genomic models using Gaussian kernels (KBLUP) have, in some cases, yielded higher prediction accuracies than standard additive models. Therefore, here we studied whether combining SSIs and kernel methods could further improve prediction accuracy when training genomic models using multi-generation data. Using four years of doubled haploid maize data from the International Maize and Wheat Improvement Center (CIMMYT), we found that when predicting grain yield the KBLUP outperformed the GBLUP, and that using SSI with additive relationships (GSSI) lead to 5-17% increases in accuracy, relative to the GBLUP. However, differences in prediction accuracy between the KBLUP and the kernel-based SSI were smaller and not always significant.


Asunto(s)
Modelos Genéticos , Zea mays , Genoma , Genómica , Fenotipo , Polimorfismo de Nucleótido Simple , Zea mays/genética
7.
Genetics ; 218(1)2021 05 17.
Artículo en Inglés | MEDLINE | ID: mdl-33748861

RESUMEN

Genomic prediction uses DNA sequences and phenotypes to predict genetic values. In homogeneous populations, theory indicates that the accuracy of genomic prediction increases with sample size. However, differences in allele frequencies and linkage disequilibrium patterns can lead to heterogeneity in SNP effects. In this context, calibrating genomic predictions using a large, potentially heterogeneous, training data set may not lead to optimal prediction accuracy. Some studies tried to address this sample size/homogeneity trade-off using training set optimization algorithms; however, this approach assumes that a single training data set is optimum for all individuals in the prediction set. Here, we propose an approach that identifies, for each individual in the prediction set, a subset from the training data (i.e., a set of support points) from which predictions are derived. The methodology that we propose is a sparse selection index (SSI) that integrates selection index methodology with sparsity-inducing techniques commonly used for high-dimensional regression. The sparsity of the resulting index is controlled by a regularization parameter (λ); the G-Best Linear Unbiased Predictor (G-BLUP) (the prediction method most commonly used in plant and animal breeding) appears as a special case which happens when λ = 0. In this study, we present the methodology and demonstrate (using two wheat data sets with phenotypes collected in 10 different environments) that the SSI can achieve significant (anywhere between 5 and 10%) gains in prediction accuracy relative to the G-BLUP.


Asunto(s)
Pruebas Genéticas/métodos , Genómica/métodos , Algoritmos , Alelos , Frecuencia de los Genes , Desequilibrio de Ligamiento , Modelos Genéticos , Modelos Teóricos , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Selección Genética/genética , Triticum/genética
8.
Sci Rep ; 10(1): 8195, 2020 05 18.
Artículo en Inglés | MEDLINE | ID: mdl-32424224

RESUMEN

High-throughput phenotyping (HTP) technologies can produce data on thousands of phenotypes per unit being monitored. These data can be used to breed for economically and environmentally relevant traits (e.g., drought tolerance); however, incorporating high-dimensional phenotypes in genetic analyses and in breeding schemes poses important statistical and computational challenges. To address this problem, we developed regularized selection indices; the methodology integrates techniques commonly used in high-dimensional phenotypic regressions (including penalization and rank-reduction approaches) into the selection index (SI) framework. Using extensive data from CIMMYT's (International Maize and Wheat Improvement Center) wheat breeding program we show that regularized SIs derived from hyper-spectral data offer consistently higher accuracy for grain yield than those achieved by standard SIs, and by vegetation indices commonly used to predict agronomic traits. Regularized SIs offer an effective approach to leverage HTP data that is routinely generated in agriculture; the methodology can also be used to conduct genetic studies using high-dimensional phenotypes that are often collected in humans and model organisms including body images and whole-genome gene expression profiles.


Asunto(s)
Imagen Molecular , Fenotipo , Fitomejoramiento , Agricultura
9.
Genes (Basel) ; 11(1)2020 01 05.
Artículo en Inglés | MEDLINE | ID: mdl-31948110

RESUMEN

Sorghum is one of the world's major crops, expresses traits for resilience to climate change, and can be used for several purposes including food and clean fuels. Multiple-trait genomic prediction and selection models were implemented using genotyping-by-sequencing single nucleotide polymorphism markers and phenotypic data information. We demonstrated for the first time the efficiency genomic selection modelling of index selection including biofuel traits such as aboveground biomass yield, plant height, and dry mass fraction of the fresh material. This work also sheds light, for the first time, on the promising potential of using the information from the populations grown from seed to predict the performance of the populations regrown from the rhizomes-even two winter seasons after the original trial was sown. Genomic selection modelling of the optimum index selection including the three traits of interest (plant height, aboveground dry biomass yield, and dry mass fraction of fresh mass material) was the most promising. Since the plant characteristics evaluated herein are routinely measured in cereal and other plant species of agricultural interest, it can be inferred that the findings can be transferred in other major crops.


Asunto(s)
Pruebas Genéticas/métodos , Sorghum/genética , Sorghum/metabolismo , Biocombustibles , Biomasa , Productos Agrícolas/genética , Grano Comestible/genética , Predicción/métodos , Genoma de Planta/genética , Genómica/métodos , Genotipo , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética
10.
Genes (Basel) ; 10(11)2019 10 24.
Artículo en Inglés | MEDLINE | ID: mdl-31653099

RESUMEN

The purpose of this work was to assess the performance of four genomic selection (GS) models (GBLUP, BRR, Bayesian LASSO and BayesB) in 4 sorghum grain antioxidant traits (phenols, flavonoids, total antioxidant capacity and condensed tannins) using whole-genome SNP markers in a novel diversity panel of Sorghum bicolor lines and landraces and S. bicolor × S. halepense recombinant inbred lines. One key breeding problem modelled was predicting the performance in the antioxidant production of new and unphenotyped sorghum genotypes (validation set). The population was weakly structured (analysis of molecular variance, AMOVA R2 = 9%), showed a significant genetic diversity and expressed antioxidant traits with a good level of variability and high correlation. The S. bicolor × S. halepense lines outperformed Sorghum bicolor populations for all the antioxidants. The four GS models implemented in this work performed comparably across traits, with accuracy ranging from 0.49 to 0.58, and are considered high enough to sustain sorghum breeding for antioxidants production and allow important genetic gains per unit of time and cost. The results presented in this work are expected to contribute to GS implementation and the genetic improvement of sorghum grain antioxidants for different purposes, including the manufacture of health-promoting and specialty foods.


Asunto(s)
Productos Agrícolas/genética , Flavonoides/biosíntesis , Fitomejoramiento/métodos , Selección Artificial , Sorghum/genética , Flavonoides/genética , Hibridación Genética , Hidroxibenzoatos/metabolismo , Polimorfismo Genético , Sitios de Carácter Cuantitativo , Taninos/biosíntesis , Taninos/genética
11.
G3 (Bethesda) ; 8(7): 2471-2481, 2018 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-29794167

RESUMEN

Potato (Solanum tuberosum) is a staple food crop and is considered one of the main sources of carbohydrates worldwide. Late blight (Phytophthora infestans) and common scab (Streptomyces scabies) are two of the primary production constraints faced by potato farming. Previous studies have identified a few resistance genes for both late blight and common scab; however, these genes explain only a limited fraction of the heritability of these diseases. Genomic selection has been demonstrated to be an effective methodology for breeding value prediction in many major crops (e.g., maize and wheat). However, the technology has received little attention in potato breeding. We present the first genomic selection study involving late blight and common scab in tetraploid potato. Our data involves 4,110 (Single Nucleotide Polymorphisms, SNPs) and phenotypic field evaluations for late blight (n=1,763) and common scab (n=3,885) collected in seven and nine years, respectively. We report moderately high genomic heritability estimates (0.46 ± 0.04 and 0.45 ± 0.017, for late blight and common scab, respectively). The extent of genotype-by-year interaction was high for late blight and low for common scab. Our assessment of prediction accuracy demonstrates the applicability of genomic prediction for tetraploid potato breeding. For both traits, we found that more than 90% of the genetic variance could be captured with an additive model. For common scab, the highest prediction accuracy was achieved using an additive model. For late blight, small but statistically significant gains in prediction accuracy were achieved using a model that accounted for both additive and dominance effects. Using whole-genome regression models we identified SNPs located in previously reported hotspots regions for late blight, on genes associated with systemic disease resistance responses, and a new locus located in a WRKY transcription factor for common scab.


Asunto(s)
Resistencia a la Enfermedad/genética , Genoma de Planta , Genómica , Enfermedades de las Plantas/genética , Selección Genética , Solanum tuberosum/genética , Tetraploidía , Algoritmos , Genómica/métodos , Genotipo , Modelos Genéticos , Enfermedades de las Plantas/microbiología , Polimorfismo de Nucleótido Simple , Reproducibilidad de los Resultados , Solanum tuberosum/microbiología , Streptomyces
12.
G3 (Bethesda) ; 5(4): 569-82, 2015 Feb 06.
Artículo en Inglés | MEDLINE | ID: mdl-25660166

RESUMEN

Genomic selection (GS) models use genome-wide genetic information to predict genetic values of candidates of selection. Originally, these models were developed without considering genotype × environment interaction(G×E). Several authors have proposed extensions of the single-environment GS model that accommodate G×E using either covariance functions or environmental covariates. In this study, we model G×E using a marker × environment interaction (M×E) GS model; the approach is conceptually simple and can be implemented with existing GS software. We discuss how the model can be implemented by using an explicit regression of phenotypes on markers or using co-variance structures (a genomic best linear unbiased prediction-type model). We used the M×E model to analyze three CIMMYT wheat data sets (W1, W2, and W3), where more than 1000 lines were genotyped using genotyping-by-sequencing and evaluated at CIMMYT's research station in Ciudad Obregon, Mexico, under simulated environmental conditions that covered different irrigation levels, sowing dates and planting systems. We compared the M×E model with a stratified (i.e., within-environment) analysis and with a standard (across-environment) GS model that assumes that effects are constant across environments (i.e., ignoring G×E). The prediction accuracy of the M×E model was substantially greater of that of an across-environment analysis that ignores G×E. Depending on the prediction problem, the M×E model had either similar or greater levels of prediction accuracy than the stratified analyses. The M×E model decomposes marker effects and genomic values into components that are stable across environments (main effects) and others that are environment-specific (interactions). Therefore, in principle, the interaction model could shed light over which variants have effects that are stable across environments and which ones are responsible for G×E. The data set and the scripts required to reproduce the analysis are publicly available as Supporting Information.


Asunto(s)
Interacción Gen-Ambiente , Genoma de Planta , Modelos Genéticos , Triticum/genética , Cruzamiento , Genotipo , Fenotipo , Selección Genética , Programas Informáticos
13.
G3 (Bethesda) ; 3(11): 1903-26, 2013 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-24022750

RESUMEN

Genotyping-by-sequencing (GBS) technologies have proven capacity for delivering large numbers of marker genotypes with potentially less ascertainment bias than standard single nucleotide polymorphism (SNP) arrays. Therefore, GBS has become an attractive alternative technology for genomic selection. However, the use of GBS data poses important challenges, and the accuracy of genomic prediction using GBS is currently undergoing investigation in several crops, including maize, wheat, and cassava. The main objective of this study was to evaluate various methods for incorporating GBS information and compare them with pedigree models for predicting genetic values of lines from two maize populations evaluated for different traits measured in different environments (experiments 1 and 2). Given that GBS data come with a large percentage of uncalled genotypes, we evaluated methods using nonimputed, imputed, and GBS-inferred haplotypes of different lengths (short or long). GBS and pedigree data were incorporated into statistical models using either the genomic best linear unbiased predictors (GBLUP) or the reproducing kernel Hilbert spaces (RKHS) regressions, and prediction accuracy was quantified using cross-validation methods. The following results were found: relative to pedigree or marker-only models, there were consistent gains in prediction accuracy by combining pedigree and GBS data; there was increased predictive ability when using imputed or nonimputed GBS data over inferred haplotype in experiment 1, or nonimputed GBS and information-based imputed short and long haplotypes, as compared to the other methods in experiment 2; the level of prediction accuracy achieved using GBS data in experiment 2 is comparable to those reported by previous authors who analyzed this data set using SNP arrays; and GBLUP and RKHS models with pedigree with nonimputed and imputed GBS data provided the best prediction correlations for the three traits in experiment 1, whereas for experiment 2 RKHS provided slightly better prediction than GBLUP for drought-stressed environments, and both models provided similar predictions in well-watered environments.


Asunto(s)
Genoma de Planta , Zea mays/genética , Cruzamiento , Cromosomas/química , Cromosomas/metabolismo , Genotipo , Haplotipos , Modelos Genéticos , Fenotipo , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...